Starburst
Overview
Starburst provides starter datasets to test out its diverse querying features. The discussion below uses those tables so that you can easily try them on your own.
Defining a Data Source
Overview
A Data Source is a Qarbine component responsible for retrieving data from somewhere. At a high level it has a name, a description and some arbitrary query string which when sent to the associated Qarbine Data Service endpoint returns some data. The overall execution flow for an analysis, including the optional prompt component, is shown below.
A single data source can be referenced by name from multiple Qarbine template components. This enables a single point of change when perhaps, an index is added, or some other query tweak is necessary. The alternative is to attempt to find all templates impacted by a schema or index change for example. This component reusability is especially beneficial when team members have varying roles and skills.
This data source can be found at “example/Starburst/burstBank/Customer products analysis”.
The data source query specification below retrieves the first 10 rows based on credit card opening date of customers in the state of NJ with a FICO score between 500 and 700.
select c.*, a.*, p.*
from burstBank.customer c, burstBank.account a, burstBank.product_profile p
where state= 'NJ'
and fico between 500 and 700
and c.custkey = a.custkey
and c.custkey = p.custkey
order by cc_open_date
limit 10
Sample results are shown below.
The details of the first element are shown below
![]() | ![]() |
Managing Answer Set Size
The default maximum number of rows starts off at 25 for a new data source. This is useful to evolve a query from a concept to one that you have verified returns the desired answer set. As noted, any native way of limiting an answer set size is the preferred approach. This setting is in the component dialog as shown below and also accessible by clicking the ‘Gear’ icon.
Once you are done drafting you can adjust this parameter. A “0” indicates there is no maximum. A number greater than 0 indicates to limit the final answer set size to that number of rows. This answer set truncation comes after any native query limit. So, if the answer set from the data endpoint is quite large, that content has to be returned to the Qarbine host. It then may truncate the number of rows. It is best to truncate at the query level (i.e., use a limit) to reduce the content sent from the data endpoint to the Qarbine host in the first place.
Adjusting Maximum Rows
Recall the default maximum rows at the component level is 25. When you are satisfied with your query you can change that setting by clicking.
Adjust the setting to “0” indicating no Qarbine answer set truncation.
Click
Prompt Integration
Overview
Qarbine prompts provide a way to obtain runtime values and variables for data source and template execution. To avoid hardcoding, prompts can use macro formulas to run queries which populate list widgets. Prompts are defined in a no code manner using the Prompt Designer. Shown below is the execution flow when there is a Prompt component.
Sample Prompt
The Prompt Designer supports a large variety of input widgets including entry fields, check boxes, radio button groups, sliders, and file input. Shown below is a prompt for single year value.
Clicking
propagates the state and FICO range variables along to data sources and templates for their consumption.
This Prompt can be found at “example/Starburst/burstBank/Prompt for customer analysis 1”. The prompt has the following elements.
The first prompt element has these main properties.
The second prompt element has these main properties.
The list’s elements are obtained using a data source containing the query
select distinct state from burstBank.customer c order by 1
The third prompt element has these main properties.
By having the default value be a list of 2 values both the start and end values of the slider can be changed. The resulting variables are ficoStart and ficoEnd in this example.
Adjusting the Data Source to Use the Prompt
The data source defined above uses a hard coded state and FICO range. Load that data source and follow the steps below to use a prompted date range instead of a hard coded one.
The following new data source can be found at “example/Starburst/burstBank/Customer products analysis with variables”.
If you load the previously defined data source then immediately do a “Save as” action to create a new data source. Give it a name such as “Customer products analysis with variables”. You will use this data source in the following section describing template definition.
There are 2 main differences from the first data source:
- The query spec has variables for a state value and a FICO score range.
- A prompt is referenced to obtain values for these variables.
The adjusted query specification is shown below.
select c.*, a.*, p.*
from burstBank.customer c, burstBank.account a, burstBank.product_profile p
where state= @state
and fico between @ficoStart and @ficoEnd
and c.custkey = a.custkey
and c.custkey = p.custkey
order by cc_open_date
limit 10
Associated the prompt with the data source by first clicking to open the properties dialog.
Activate the Prompt tab.
Choose the drop down option shown below.
Since the prompt was just defined use the recents dialog button to choose the prompt reference. The desired result is shown below.
Save the new data source updates by clicking
To run the data source click
The prompt is presented.
Adjust the values as desired.
Click
Sample results are shown below.
Defining an Analysis Template
Overview
A template defines how to process the data being retrieved from Data Source queries and other data expressions. It also defines formulas, formatting options, and other analysis and presentation options. The overall execution flow for an analysis, including the optional prompt component, is shown below
Using the Template Designer
This template can be found at “example/Starburst/burstBank/Customer products analysis with variables”. Sample output is shown below.
Review its properties by clicking on .
Notice it is associated with the data source defined above. Recall that data source is associated with the prompt described above as well.
The formatting options are shown below.
The general layout of the template is shown below.
The report header area displays a logo along with the runtime variables for the state and the FICO score range. The ‘@’ prefix indicates a variable reference.
As processing iterates through the rows the first body line shows in bold the customer name and address information. The ‘#’ prefix indicates a field of the current row.
The line itself has the following property to further help visually separate the customers.
On the 2nd body line the marital status boolean is shown as a checkbox using the custom boolean cell.
These cell definition techniques are used amongst the various body cells.
The first group summary line has a wide cell with an underline for visual clarity.
The last group summary line has the following cells.
The 2nd cell sums the mortgage balances across all of the rows and displays the value without any cents.
The 4th cell sums the credit card balances across all of the rows.
Sample Analysis with a Chart
Overview
Consider the analytic results shown below.
This template is located at “example/Starburst/burstBank/Credit card analysis by state top 10”.
A chart can be added to display the data. The chart collects the data as the template body section is processed. The chart cell is defined in the report summary section. By placing a page break “before” on the first report summary line and checking “Make last page the first page” we can show the chart before the details on the resulting output. This is explained in more detail below.
The result is shown below.
Template Technique
This template is located at “example/Starburst/burstBank/Credit card analysis by state top 10 with chart”. The template’s formatting options of interest are:
The first report summary line has
The 2nd report summary line has the custom chart cell. Its formula to provide the chart title is
= "Total Balance by State"
The cell output height is shown below.
The main options of interest are:
The formatting options of interest are:
The supporting data source is at “example/Starburst/burstBank/Credit card analysis by state top 10”. Its query specification is
SELECT c.state,
COUNT(a.custkey) AS number_of_accounts,
SUM(a.cc_balance) AS total_cc_balance,
AVG(a.cc_balance) AS average_cc_balance
FROM burstbank.customer c
JOIN burstbank.account a ON c.custkey = a.custkey
WHERE a.cc_status = 'open' and c.country = 'US'
GROUP BY c.state
ORDER BY total_cc_balance desc
limit 10